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Abstract— Increased investments in telecommunications 
companies (5G/6G, IoT, convergence of e-services with telecom 
services), regulatory constraints and increased competition 
enforce automation within telecom operators’ procedures to 
improve flexibility and competitiveness. Among these, sales 
operations and processes require a better understanding of the 
client's profile and customer needs before selecting and proposing 
the “right” product at the “best” price, choosing from an 
increasingly complex collection of offers and tariff packages. 


To this end, various methods are employed, aiming to understand 
and estimate the user's behavior, traffic prediction and 
willingness to pay. Based on such information, sales channels 
must select and propose a product at an acceptable price, which 
will will both be attractive to the customer and _ provide 
satisfactory and sustainable revenue to the company, ensuring a 
long lasting relationship between the operator and the customer. 


Such methods require complex machine learning exploration and 
exploitation algorithms , applied to high volumes data of variant 
complexity, with the additional requirement that execute the 
whole process must be executed instantly so that the salesperson 
can access the proposed offers while the customer is in front of 
them. Furthermore the outcome of the interaction must be fed 
back to the system to enable continuous updating of data, events 
and finally offers. 


The AutoSPRice project of Neurocom resulted to a fully 
automated and autonomous revenue and customer lifecycle value 
management system, which produces the most appropriate offers 
for each sales channel both in batch and real-time conditions. The 
system can predict, using the appropriate machine learning 
algorithms, how future telecom subscriber needs will evolve at 
individual and group level (company, family) in the short and 
medium-term in time. 


It also assesses user’s financial capacity and willingness to buy 
across different offerings. Hence, it enables sales channels to 
identify what is the right product that best meets a customer's 
needs by using reinforcement learning. Combined with systems 
that can perform big data management and processing, it 
immediately provides responses out of large datasets. 


Also, using exploration-exploitation algorithms, it identifies the 


price to achieve the best chance of being accepted while keeping 
sustainable revenue and profit for the provider. All results can be 
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immediately visualized to improve the ability of users and sales 
channels to compare and evaluate their offers. 


The system has been developed and tested extensively under 
realistic usage scenarios, to become an innovative revenue 
management and price optimization system for the 
telecommunications sector with features that can further appeal 
to more service areas (entertainment, energy). 


Keywords— telecom usage forecasting, price optimization, 
machine learning, price recommendation, reinforcement learning 


DEFINITION OF TERMS 


Rate Plan: The rate plan (mobile or fixed plan) describes how 
the charges are applied. It includes pricing policies that dictate 
what prices should apply for each service, plus free module packs. 
There are two types of rate plans (both have the same structure 
but represent different concepts): base rate plans and addon rate 
plans. 


Base Rate Plan: It is the basic plan associated with a 
customer. Only one basic plan can be awarded per customer. 


ADDON or Addon Rate Plan: An addon plan provides 
additional bundles on top of the base rate plan. It provides 
additional free minutes, short messages and/or data allowances for 
a certain fee. A client can have none, one or more addons. 


Onnet: Telephony traffic where the origin and the destination 
concern the same telecommunications provider. 


Offnet: Telephony traffic where the source and destination 
are from a different telecommunications provider. 


I. INTRODUCTION 


With the increasing use of smartphones and other 
connected mobile devices, there has been a surge in the 
amount of data flowing through the networks of telecom 
operators. They need to rapidly store, process, and extract 
useful insights from the available data. 


Big data processing technologies can help telecom 
companies to increase profitability by helping optimize 
network usage and services, enhance customer experience, 
and improve security. Big data processing technologies 


also provide the telecommunications industry with access 
to new opportunities. It can improve the quality of service 
and routing traffic more effectively. By analyzing call data 
records in real-time, telecom operators can also identify 
fraudulent behavior and act on them immediately. This 
ultimately gives them a competitive advantage in the 
market and helps uncover hidden potential. Our work is 
engaged with another big data exploitation task for 
telecoms, which is price optimization, of utmost 
importance towards reducing churn and_ increasing 
revenue. 


Price Optimization - With competition between 
telecommunication providers intensifying, it is crucial for 
telecom operators to set optimal prices for their products 
and services. With the help of data analytics, telecom 
operators can gain accurate data insights and create optimal 
pricing strategies by analyzing customers’ reactions to 
different pricing strategies, purchase history, and 
competitor pricing. 


Increasing Revenue - Telecom providers can identify 
the perceived value of their product or services, and 
improve their sales team’s effectiveness. Optimizing the 
pricing strategy based on profit and revenue earned can 
boost sales, help gain more customers, and most 
importantly, retain loyal customers. Big data for telecom 
industry helps companies retain customers by offering new 
services and content. But how do they know what their 
customers want? Big data analytics helps telecom 
companies to build a customer persona and guess the needs 
and interests of their customers. The right content and 
flexible offerings powered by price optimization retain old 
customers and increase operator’s revenues. 


Current research was performed under the title, 
“Autonomous System for A) Selection B) Pricing C) Real 
Time Commission Calculation of telecom products using 
Machine Learning and Big Data”, briefly named 
AutoSPRice. The purpose of AutoSPRice was to create an 
automatic system that creates strategies for 
telecommunications companies where phone plans will be 
offered to customers in the ‘right price’ in order to increase 
the company's profit. 


The project was divided into five parts. The first one 
developed an artificial intelligence system that analyzes 
historical customer behavior data and predicts their change 
in the medium or long-term future. In particular, the system 
can recognize patterns in customer behavior and uses these 
patterns to make predictions. 


In the second part, an artificial intelligence system has 
been developed, that generates offers to sell to customers. 
Subsequent modules complete the project by integrating 
the procurement calculation system, the visualization of the 
results, and finally their evaluation and planning towards 
exploitation. 


This paper describes the first two parts: usage 
prediction and appropriate price calculation. 


Il. HIGH LEVEL APPROACH 


A. Overview 


We consider customers who are registered (or have a 
contract) with a telecommunications company. 


In order to address the usage prediction and appropriate 
price calculation the following steps have been followed: 


The first step is to identify customer profiles and exploit 
the characteristics of each profile to predict their respective 
needs. The profile of a customer, in this paper, mainly 
consists of behavioral characteristics related to the use of 
telecommunication media. By customer behavior, we mean 
consumption of a) call duration time, b) data volume and c) 
number of text messages, over a period of time. The 
concept of time period depends on the type of customers 
the system is targeting. For example, for prepaid customers, 
the time period is the day; for contract customers, the time 
period is a month. 


The second step after profiling customers is to exploit 
the information available about them in order to propose an 
attractive offer that will optimize the company’s profit. 


To adjust the price in a meaningful way, an additional 
characteristic that is considered is the calculation of the 
probability that they will voluntarily unsubscribe (churn). 
If a customer wants to churn - that is, they want to end their 
contract with the telecommunications company - they will 
be offered a product at a highly competitive price, in an 
attempt to extend the customer relationship with the 
company. 


Once the profile has been identified, and give the churn 
probability of each customer, the process of creating offers 
is divided into two stages. In the first stage, we calculate an 
‘attractiveness metric’ for each commercially available 
offering and customer. In the second stage, a price is 
calculated for each offering. 


The tools we have used in both stages are supervised, 
unsupervised and reinforcement machine learning models, 
which include neural networks, decision trees and q- 
learning with neural networks, etc. 


Although the work involved the identification of 
telecom offers, it is feasible to adapt the algorithms to other 
similar services and utilities, like power and water supply. 


B. The Data 


To test the outcome of the selected algorithms we 
created synthetic data that consists of usage consumption 
and invoice fees for 4500 customers. 


We have generated three usage types over time for 12 
months: a) consumptions with uptrend, b) downtrend, and 


c) constant trend with small perturbations. The choice of a 
trend for each customer is random in respect to the uniform 
distribution function. We also assigned to each customer a 
base rate plan and/or ADDONs. 


Furthermore, we randomly defined the type of 
customers with regards to usage consumption: 


1. Voice calls 
2. International Voice Calls 
3. DATA usage 


Each type is chosen again randomly according to the 
uniform distribution function. Having the choices described 
above, we generated the usage consumption, and we 
immediately calculated the invoice fees (in detail) for each 
customer over a period of 18 months. 


To generate the data consumption for each type of 
customer we performed a Monte Carlo simulation with 
mean value half of each rate plan allowance — per 
corresponding customer type - and standard deviation the 
difference between mean value and total allowance. 


C. The Process 


The algorithmic process is divided into three phases. In 
the first phase the system collects all appropriate data for all 
targeted customers. The second phase is devoted to the 
training and predictions of data. The third phase calculates 
the offer. We note that the offer is a product at a discounted 
price. Each product corresponds to a scenario. 


Each customer receives an invoice every bill cycle. The 
time distance between two bill cycles of a customer is a 
month. Normally there are several invoice dates within a 
month. We define the current invoice date for a customer to 
be the latest invoice date for that customer. Hence, in the 
first phase for each customer the system collects aggregated 
fees, CDRs, ADDON’s names, discounts names within the 
last 12 bill cycles up to the current invoice date. The rate 
plan, ADDONs and discounts will be used later in the tariff 
engine simulator. The other attributes are discussed below. 


In the second phase the system collects the historical 
CDRs of the customers and applies the normalization 
algorithm. The induced data are fed into the usage 
prediction algorithm as described in Algorithm 1). The 
result is the forecast of the monthly average of CDRs for 
each customer. At this stage the system calls the tariff 
engine simulator 2) and feeds it with the predicted CDRs, 
ADDONs, and discounts. Then, the result of the simulator 
is a prediction of fees with respect to each scenario. To use 
the adoption probability algorithm 3) the system needs 
historical adoptions to train. Then, the predicted fees 
described above are used to get a probability rate for each 
scenario and each customer. 


In the final stage, the best product with an optimal price 
is calculated to offer to customers. 


II. ALGORITHMIC DETAILS 


The algorithmic stages that lead to the appropriate offer 
identification are the following: 


1) Prediction of usage behavior algorithm 


The aim of this algorithm is to make a prediction of the 
future monthly average of usage behavior for each 
customer. By usage behavior we mean the consumption 
made by the customers in a period of a billing cycle 
(defined as the period starting from an invoice date until the 
expiration of the next). All these data are gathered in an 
aggregated form, the Call Detail Records (CDRs). Thus, 
the algorithm collects historical CDRs of each customer in 
a time interval of the past 12 months and predicts their 
future monthly average usage in the form of ‘virtual’ 
CDRs. 


The first part of the algorithm is a classification of CDRs 
into services. Then, we treat each service separately. Each 
service per customer is considered as a time series. 


The first step normalizes each timestep of the timeseries in 
cases it corresponds to an outlier. Sultan-Ali-Zhang 
proposed a similar algorithm for anomalies’ detection in a 
timeseries [1]. The system can identify the outliers 
automatically by grouping the timesteps into clusters using 
the KMeans [2] clustering algorithm. The cluster with the 
smaller cardinality consists of candidate outliers. If each of 
the candidate outliers differs from the average of the other 
timesteps by a predefined threshold 0, then it is considered 
as an outlier and it is substituted by the average value. 


The algorithm is then applied. For each customer and 
service, a timeseries of usage volume (or CDRs) is 
considered for the last 12 months. This timeseries is 
denoted by X and let C be the cardinality of non-zero 
elements of X. 


Normalization Algorithm 
inputs = (X,C, 9, Omin, 9max) 
M =avg(X) 
[cl,, cl,, clz] = KMeans(X) 
cl = argmin{#{cl,}, #{cl2}, #{cls}} 
If #{c}} <@*C: 

for jincl: 

if Onin = 7 s Omax: 
replace j by M 

output = X 


After the timeseries are normalized, they are fed into a 

TensorFlow neural network. The neural network consists 

of a TimeDistributed layer, a bidirectional layer, and a 

dense layer. If the neural network is denoted by f, then, 
X» f(x) 


2) Tariff engine simulator 


The scope of this algorithm is to answer the following 
question: If a customer buys a product, will he/she be 
charged an extra amount of money that corresponds to the 
above bundle usage? 


The tariff engine machine is a tool that creates ‘what-if? 
scenarios for each customer and product, resulting in above 


bundle usage charges. More particularly, the system 
collects all commercially available products with their 
characteristics such as nominal price, free units, and tariffs. 
It also collects the predicted CDRs from the previous step. 
Then, the tariff engine performs a simulation for every 
scenario that begins with a customer and a product and runs 
over all customers and products. 


3) Adoption probability 


The aim of this algorithm step is to calculate the probability 
of product adoption per customer. This probability is 
known as Adoption Probability or Adoption Rate. The 
adoption probability has been studied extensively (cf. [3], 
[4]) using Bayesian methods. Our method is slightly 
different and uses a sequential neural network in a 
preprocessed set of data. 


The system collects a monthly average of previous invoice 
charges in detail, CDRs, and free units. Then, for every 
commercially available product, it computes the 
corresponding difference between the future charges, 
CDRs as calculated by 1) and 2), and free units. 


The result is fed into a binary machine learning model. The 
model consists of a simple neural network architecture in 
which the output contains a layer with one node and the 
Sigmoid activation function. 


4) Calculate optimal price of best product 


The system computes the best product to offer to a 
customer, based on the adoption probability from the 
previous algorithm. The best product is chosen to be the 
one with the highest probability among the products that 
give higher predicted fees than the previous monthly 
average fees. 


We denote by P[customer,product] the adoption 
probability from previous algorithm. Also, we denote by 
R(customer, product) the predicted revenue deduced 
from Tariff Engine simulator. Then, the choice of best 
product to be offered to each customer is computed as 
follows: 


Best product calculation 
inputs = (customer, product, product, ...) 


HR = (historical revenue per customer) 
for each customer j: 
best; = product; 
for each product;: 
if RU, product;) > HR(U) 
and 
Pj, product;| > Pj, product;|: 
best; = product; 


The price of each product has a nominal value. The aim of 
this algorithm is to offer the best product at a discounted 
price. However, the discounted price should not be less 


than the monthly average of previous fees. Otherwise, the 
product would be sold without a profit. 


The algorithm of the calculation of the offered price is 
based on a reinforcement learning algorithm, namely, Q- 
learning [9]. Several algorithms have been suggested for 
price calculation through reinforcement learning (cf. [5], 
[6], [7], [8]). The advantage of reinforcement learning 
algorithms is that the system learns dynamically by 
interacting with the environment. The environment in our 
case is the set of customers. The interaction is an 
exploration and exploitation process in which at every step 
the system decides to make an offer based on previous 
responses. 


The main idea is that an agent at each state chooses an 
action (from a set of predefined actions) to offer and then 
the agent receives a reward based on the response from the 
environment in the current state. Then, the agent evaluates 
the reward and in the next state he chooses a new action, 
where the objective is to maximize the cumulative reward 
in the long term. The choice of action made by the agent is 
based on a Q-value. More particularly, for each state and 
each action there is an associated Q-value which is 
reevaluated after the reception of the reward. If A is the set 
of actions and S is the set of states, then the Q-value is 
considered as a function Q:S x A > R. If R s the reward, 
then the update of the Q-value is updated through the 
Bellman’s equation: 


Q(s,a) = (1—-a) x Q(s,a) tax (R+yv x Q(s',a)), 
Where a is the learning rate and y is a discount factor. 


The set up of the algorithm is as follows. The customers at 
each time step are grouped into clusters based on the 
adoption rate probability, as described in 3), and the 
difference between previous actual fees and nominal target 
fees (normalized from 0 to 1). If AR is the adoption rate and 
DPRC is the percentage difference of the fees, then we 
assign to each customer a value of the form: 
dist = AR? + DPRC?. 

We consider tuple of thresholds 0 = (64, 6,...,0,). 
Clusters are indexed by [6;,6;,,]. Thus, if 0; < dist < 
6;,, then the customer is assigned to the corresponding 
cluster. These clusters correspond to states as described 
above. 


The set of actions is A and consists of discrete values that 
correspond to the percentage increase revenue that a 
customer agrees to accept. The maximum increase should 
not exceed the nominal price of a product. Thus, for each 
customer, the available set of actions to choose is a subset 
of A. Hence, the discount offered to a customer is the 
difference between the revenue increase and the nominal 
price of the product. 


Q-learning pricing algorithm 

inputs = (customers, A,y,a, AR, 6, €,nr) 
Initiate Q-values 

Set SA © A asa set of actions 

For each customer and scenario: 


Get state s 

oe — a)), ifl—e 
Random action, if € 

If @ is accepted: 


a-0.6 
R= exp(—<) 
Else: 
R=nr 
R’ (new revenue after a response of customer) 
s’ (new state after a response of customer) 
If offered accepted: 
Q(s,a) = (1-a@) x Q(s,a) +ax (R +yx Q(s’,a)) 
Otherwise: 
Q(s,a@) = (1-a@)x Q(s,a) +axR. 


In the algorithm above, the value nr stands for negative 
reward, and it corresponds to the reward received after the 
response of the customer. 


IV. IMPLEMENTATION 


The implementation of the system is based on 
processing of data flows comprised of telecom usage events. 
Events move within the system, and are enriched with 
additional information about predicted service usage and 
charge estimates until the desired final result is produced. 


In our case, the structural elements of our architecture 
form a directed acyclic graph (DAG — Directed Acyclic 
Graph), while the heart of the architecture is the existence 
of a distributed processing system, which ensures the 
reliable flow of processing even in the presence of high- 
volume data. 


These data are bounded data derived from data 
extraction for subsets of customers that have been grouped 
by customer segment. Importantly, the data is ingested into 
the distributed system in a chronological manner, ensuring 
a continuous time sequence of events for each 
user/customer. As a result of the batch processing nature of 
the dataset, our architecture leverages several notable 
advantages, including the ability to repeat calculations if the 
initial processing encounters any failures. 


A summary of the overall subsystem architecture is 
shown in the figure below: 
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Figure | Product Identification and Price Recommendation System — 
Implementation Architecture 


The above figure focuses on those nodes that are pivotal 
parts of the processing. 


As an example, we can refer to the part of the processing 
that interacts with the Usage Prediction Modeling, we can 
also distinguish the part of the processing that calculates the 
pricing of the different scenarios (Rating Cdr Traffic) and 
finally the point the most appropriate proposals are selected 
for each user (Offer Inference Processor). 


V. RESULTS 


To test the usage prediction algorithm, we used the synthetic 
data generated in Section II. 


Usage prediction algorithm. As explained in Section II, 
each customer has up to 18 months of usage consumption. 
The first 12 months are considered as the known. We predict 
the monthly average for the subsequent 6 months. 


Scatter plot of true vs predicted usage in mega bytes 
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Figure 2 Predicted and Actual Usage Volume of Customers 


The scatter plot above shows the predicted and actual 
DATA usage volume of customers. Apart from a few high 
deviations, the predictions are close to the actual averages. 
Another view of the predictions is depicted in the following 
stacked bar plot. 


Stacked Bar difference between actual and predited DATA usage 
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Figure 3 Classification of the differences between prediction and actual 
data usage 


In this plot, we classified the absolute differences between 
predicted and actual data into 6 buckets. 


Q-learning pricing algorithm. Suppose that for each 
customer we have calculated the best product to offer. As 
described in the pricing algorithm, for each customer the 
system chooses a discount yielding to a discounted offer 
price. The latter offer price gives a percentage upsell of the 
current fees of the customer. We denote this upsell by U. 


To test whether U is an upsell accepted by the customer we 
create a custom upsell following a deterministic rule and we 
compare the two upsells. The rules are described below. 


Let X be a vector representing the historical average usage 
charges of customers separated into services. The services 
are: 


e  Onnet voice 

e §=©Offnet voice 

e = =©National SMS 

e DATA 

e International voice 

e = International SMS 

e Pass through (uncategorized usage charges) 


We concatenate X with the adoption probability of the best 
product and its percentage difference of the nominal price 
and the historical average fees of each customer. We denote 
the result again by X. We take the dot product of X by the 
vector. 


(2,3, 0.00001, 3, 0.1, 0.1, 0.1,0.1, 3, 2) 


The result is multiplied by 0.0765 and we subtract from the 
result 0.084. The final result is a number of absolute values 
between 0 and 1. If the absolute difference of the upsell 
chosen by the algorithm and the custom upsell is less than 
5% then the customer accepts the system’s upsell. 


We recall that each time the pricing algorithm offers a 
discounted price to a customer, a reward is received. Thus, 
to measure the performance of the pricing algorithm we 
evaluate the rewards received after the offer. The calculation 
of the reward is explicitly described in the Q-learning 
pricing algorithm. To evaluate the rewards, we choose -0.03 
as the negative reward (nr in the pricing algorithm). The 
picture below shows the reward per customer, after we make 
an offer. 
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Figure 4 Algorithm Performance in terms of the reward 


Since every offer is calculated customer by customer, to 
evaluate the total performance of the algorithm at each step 
we add the previous rewards. Hence, we obtain the 
cumulative reward. As long as the cumulative reward’s 
trend is not decreasing, we conclude that the algorithm’s 
performance is good. We can see the cumulative reward in 
the following picture. 
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Figure 5 Algorithm Performance in terms of the cumulative reward 


VI. FURTHER WORK 


For the future evolution and improvement of the 
algorithms the deep Q-learning algorithm should be 
strongly considered because the Q-learning pricing 
algorithm is limited to the definition of the number of states 
prior to the beginning of the execution. Each customer is 
assigned to one of the available predefined clusters, that is, 
the states. Then, the algorithm updates the set of Q-values 
for each state. 


Lately there have been several variants of Q-learning. 
The most popular one is the combination of Q-learning and 
deep neural networks [6]. In this latter case, the system does 
not consider a predefined set of states, but rather a set of 
attributes for each customer. It is very likely that deep Q- 
learning can achieve better results, in the sense that it can 
have a better understanding of the needs of customers 
relative to the profit of the company. Below we provide a 
brief information about how deep Q-learning works. 


In the setup of a deep Q-learning algorithm we need a 
set of attributes, namely X, of a customer, and a neural 
network architecture in which that reads the structure of the 
attributes. The output of the neural network is a vector of 
dimension equal to the number of actions. Hence, each 
coordinate of the output corresponds to a Q-value of a 
particular action. 


The analogue of the update of Q-values is achieved by 
training the weights of the neural network as follows. We 
initialize a set of weights for the neural network. 


f[:X > RMI, 


where A is the set of actions. After an action 
(argmax{f (X)} is offered to a customer, we get a 
negative or a positive reward R. Then, the input of the 
customer is his/her attributes, and the output is a vector 
with zero values except in the position that corresponds 
to the action that was offered. If the action is negative, 
the latter value is R. If it is positive, the value is the 
result of a function of R and the new state X’ of the 
customer: 


R+y x fX)@. 


The difficulty of deep Q-learning is an appropriate 
choice of attributes X of a customer. If the dimension of X 
is small, then technically, the system does not explore and 
exploits all actions. Finally, an appropriate formulation of a 
deep Q-learning algorithm can lead to a ultimate algorithm 
for choosing the best product and calculate the optimal 
price. 


Another path for further work is adaptation of the 
solution architecture to other application domains. Usage 
profiling and forecasting, willingness to pay estimation, 
churn prediction and price recommendation (or 
consumption pattern recommendation) are valuable 


outcomes to achieve in similar big data environment for the 
power and water supply industry. Core algorithmic staging 
architecture and the relevant challenges will not differ 
significantly; adaptations are expected on the specific 
parameters of the algorithms and the selection of the 
algorithms that will apply. 
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